VNTRseek—a computational tool to detect tandem repeat variants in high-throughput sequencing data
نویسندگان
چکیده
DNA tandem repeats (TRs) are ubiquitous genomic features which consist of two or more adjacent copies of an underlying pattern sequence. The copies may be identical or approximate. Variable number of tandem repeats or VNTRs are polymorphic TR loci in which the number of pattern copies is variable. In this paper we describe VNTRseek, our software for discovery of minisatellite VNTRs (pattern size ≥ 7 nucleotides) using whole genome sequencing data. VNTRseek maps sequencing reads to a set of reference TRs and then identifies putative VNTRs based on a discrepancy between the copy number of a reference and its mapped reads. VNTRseek was used to analyze the Watson and Khoisan genomes (454 technology) and two 1000 Genomes family trios (Illumina). In the Watson genome, we identified 752 VNTRs with pattern sizes ranging from 7 to 84 nt. In the Khoisan genome, we identified 2572 VNTRs with pattern sizes ranging from 7 to 105 nt. In the trios, we identified between 2660 and 3822 VNTRs per individual and found nearly 100% consistency with Mendelian inheritance. VNTRseek is, to the best of our knowledge, the first software for genome-wide detection of minisatellite VNTRs. It is available at http://orca.bu.edu/vntrseek/.
منابع مشابه
Sequencing technologies and tools for short tandem repeat variation detection
Short tandem repeats are highly polymorphic and associated with a wide range of phenotypic variation, some of which cause neurodegenerative disease in humans. With advances in high-throughput sequencing technologies, there are novel opportunities to study genetic variation. While available sequencing technologies and bioinformatics tools provide options for mining high-throughput sequencing dat...
متن کاملDigital fragment analysis of short tandem repeats by high‐throughput amplicon sequencing
High-throughput sequencing has been proposed as a method to genotype microsatellites and overcome the four main technical drawbacks of capillary electrophoresis: amplification artifacts, imprecise sizing, length homoplasy, and limited multiplex capability. The objective of this project was to test a high-throughput amplicon sequencing approach to fragment analysis of short tandem repeats and ch...
متن کاملSTEAK: A specific tool for transposable elements and retrovirus detection in high-throughput sequencing data
The advancements of high-throughput genomics have unveiled much about the human genome highlighting the importance of variations between individuals and their contribution to disease. Even though numerous software have been developed to make sense of large genomics datasets, a major short falling of these has been the inability to cope with repetitive regions, specifically to validate structura...
متن کاملUPDtool: a tool for detection of iso- and heterodisomy in parent-child trios using SNP microarrays
UNLABELLED UPDtool is a computational tool for detection and classification of uniparental disomy (UPD) in trio SNP-microarray experiments. UPDs are rare events of chromosomal malsegregation and describe the condition of two homologous chromosomes or homologous chromosomal segments that were inherited from one parent. The occurrence of UPD can be of major clinical relevance. Though high-through...
متن کاملThe Accuracy, Feasibility and Challenges of Sequencing Short Tandem Repeats Using Next-Generation Sequencing Platforms
To date we have little knowledge of how accurate next-generation sequencing (NGS) technologies are in sequencing repetitive sequences beyond known limitations to accurately sequence homopolymers. Only a handful of previous reports have evaluated the potential of NGS for sequencing short tandem repeats (microsatellites) and no empirical study has compared and evaluated the performance of more th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 42 شماره
صفحات -
تاریخ انتشار 2014